Import libraries

Problem 1. Some clustering

i) K-means

ii) GMM clustering

iii) HMM clustering

- Accuracy Summary of Problem 1

Approach KMeans GMM HMM
Accuracy(%) 76.0 90.8 92.1

Comment: GMM & HMM outperforms K-Means because each cluster may not necessarily be circular-shaped in N_sel -dimensional space, and GMM & HMM (which both have Gaussian shape) can capture more complicated shapes of clusters with covariance matrix. Also, HMM outperforms GMM because it exploits time-domain information by incorporating the order of events while other two algorithms don't.

Problem 2 Spoken Digit Learning using HMM

Comment: Optimized GMM HMM parameter to get highest accuracy which is 100% for all digits.

Problem 3: Activity Recognition

Check mean square error (MSE) for each class for randomly single training data to see the pattern first.

After trying out several example trial cases, it could be observed that intrinsic MSE level is different and there are specific range of variation of MSE level for each class. (For instance, red line(class = 4) always stays low regardless of sample index and N values.

let us find average MSE for each class using all training data together by concatenating those in both input and output. Then compare that with test data's mse to identify its class by comparing distance between MSEs.

This is eventually including average MSE values for each class to the loss function for classification

- Accuracy Summary for each N

w/ Training data

N Class 1 Class 2 Class 3 Class 4 Average
10 63.5 36.6 63.0 61.1 56.1
30 54.3 46.9 63.9 60.7 56.5
50 76.0 44.0 68.0 54.7 53.6
70 54.3 39.5 56.2 63.6 53.4
80 58.3 28.8 63.9 53.4 51.1
90 43.5 8.23 72.6 58.3 45.7
110 5.65 2.06 95.4 36.8 35.0

w/ Test data

N Class 1 Class 2 Class 3 Class 4 Average
10 70.2 37.8 67.5 62.0 59.4
30 57.9 49.8 62.3 57.6 56.9
50 52.4 49.1 66.3 53.9 55.4
70 61.2 44.3 62.3 62.3 57.5
80 65.7 36.8 63.5 51.5 54.4
90 45.3 13.1 69.8 54.9 45.8
110 5.83 3.44 96.0 36.7 35.5

As can be seen, performance of each class reacts differently as N changes. 57 % average accuracy achieved overall if N = 30, but accuracy gets over 60 % at N = 70 ~ 90 if ignoring class 2.

Let us pick (N=30, 80) and plot confusion matrix respectively.

This is for N=30 case. It seems that class 4 is clearly different from class 1,2,3 because all classes 1-3 never been misjudged as class 4. Also, likelihood of missing 4 is also lowest.

This is for N=80 case. Similarly, class 4 is exceptional whilst class 1, 3, 4 outperforms over class 2.

Reason why model cannot learn well: In my opinion, matrix A that we learn is a linear estimator but we are introducing varying shape of functions over time but still keep one A at all times.

One approach to make things better using linear estimators is to learn multiple values of A for each time index instead of learning just one A to incorporate temporal environment at that time t.